Overview

Dataset statistics

Number of variables14
Number of observations49997
Missing cells8715
Missing cells (%)1.2%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.7 MiB
Average record size in memory98.0 B

Variable types

Numeric8
Categorical2
DateTime2
Boolean2

Alerts

avg_surge is highly correlated with surge_pctHigh correlation
surge_pct is highly correlated with avg_surgeHigh correlation
avg_surge is highly correlated with surge_pctHigh correlation
surge_pct is highly correlated with avg_surgeHigh correlation
avg_surge is highly correlated with surge_pctHigh correlation
surge_pct is highly correlated with avg_surgeHigh correlation
avg_rating_of_driver has 8120 (16.2%) missing values Missing
df_index is uniformly distributed Uniform
df_index has unique values Unique
trips_in_first_30_days has 15388 (30.8%) zeros Zeros
surge_pct has 34407 (68.8%) zeros Zeros
weekday_pct has 9202 (18.4%) zeros Zeros

Reproduction

Analysis started2022-01-11 04:08:02.470920
Analysis finished2022-01-11 04:08:15.866956
Duration13.4 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct49997
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25000.20503
Minimum0
Maximum49999
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size390.7 KiB
2022-01-10T23:08:15.967888image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2500.8
Q112500
median25001
Q337500
95-th percentile47499.2
Maximum49999
Range49999
Interquartile range (IQR)25000

Descriptive statistics

Standard deviation14433.87743
Coefficient of variation (CV)0.5773503622
Kurtosis-1.200046285
Mean25000.20503
Median Absolute Deviation (MAD)12500
Skewness-4.679185283 × 10-5
Sum1249935251
Variance208336817.7
MonotonicityStrictly increasing
2022-01-10T23:08:16.121152image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
333511
 
< 0.1%
333291
 
< 0.1%
333301
 
< 0.1%
333311
 
< 0.1%
333321
 
< 0.1%
333331
 
< 0.1%
333341
 
< 0.1%
333351
 
< 0.1%
333361
 
< 0.1%
Other values (49987)49987
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
61
< 0.1%
71
< 0.1%
81
< 0.1%
91
< 0.1%
ValueCountFrequency (%)
499991
< 0.1%
499981
< 0.1%
499971
< 0.1%
499961
< 0.1%
499951
< 0.1%
499941
< 0.1%
499931
< 0.1%
499921
< 0.1%
499911
< 0.1%
499901
< 0.1%

city
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size390.7 KiB
Winterfell
23336 
Astapor
16533 
King's Landing
10128 

Length

Max length14
Median length10
Mean length9.818249095
Min length7

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowKing's Landing
2nd rowAstapor
3rd rowAstapor
4th rowKing's Landing
5th rowWinterfell

Common Values

ValueCountFrequency (%)
Winterfell23336
46.7%
Astapor16533
33.1%
King's Landing10128
20.3%

Length

2022-01-10T23:08:16.241218image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-01-10T23:08:16.317120image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
winterfell23336
38.8%
astapor16533
27.5%
king's10128
16.8%
landing10128
16.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

trips_in_first_30_days
Real number (ℝ≥0)

ZEROS

Distinct58
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.27583655
Minimum0
Maximum73
Zeros15388
Zeros (%)30.8%
Negative0
Negative (%)0.0%
Memory size390.7 KiB
2022-01-10T23:08:16.409454image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile9
Maximum73
Range73
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.752847648
Coefficient of variation (CV)1.648997002
Kurtosis36.28161543
Mean2.27583655
Median Absolute Deviation (MAD)1
Skewness4.636830746
Sum113785
Variance14.08386547
MonotonicityNot monotonic
2022-01-10T23:08:16.542834image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
015388
30.8%
114108
28.2%
27402
14.8%
33788
 
7.6%
42562
 
5.1%
51616
 
3.2%
61134
 
2.3%
7819
 
1.6%
8589
 
1.2%
9471
 
0.9%
Other values (48)2120
 
4.2%
ValueCountFrequency (%)
015388
30.8%
114108
28.2%
27402
14.8%
33788
 
7.6%
42562
 
5.1%
51616
 
3.2%
61134
 
2.3%
7819
 
1.6%
8589
 
1.2%
9471
 
0.9%
ValueCountFrequency (%)
731
< 0.1%
711
< 0.1%
631
< 0.1%
581
< 0.1%
562
< 0.1%
552
< 0.1%
542
< 0.1%
532
< 0.1%
511
< 0.1%
501
< 0.1%
Distinct31
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size390.7 KiB
Minimum2014-01-01 00:00:00
Maximum2014-01-31 00:00:00
2022-01-10T23:08:16.679906image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:16.827635image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=31)

avg_rating_of_driver
Real number (ℝ≥0)

MISSING

Distinct37
Distinct (%)0.1%
Missing8120
Missing (%)16.2%
Infinite0
Infinite (%)0.0%
Mean4.601549777
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.7 KiB
2022-01-10T23:08:16.965196image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3.4
Q14.3
median4.9
Q35
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)0.7

Descriptive statistics

Standard deviation0.6173427565
Coefficient of variation (CV)0.1341597476
Kurtosis8.13775609
Mean4.601549777
Median Absolute Deviation (MAD)0.1
Skewness-2.428452353
Sum192699.1
Variance0.381112079
MonotonicityNot monotonic
2022-01-10T23:08:17.113466image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
520770
41.5%
44193
 
8.4%
4.52498
 
5.0%
4.82430
 
4.9%
4.71945
 
3.9%
4.91771
 
3.5%
4.31487
 
3.0%
4.61143
 
2.3%
31003
 
2.0%
4.4829
 
1.7%
Other values (27)3808
 
7.6%
(Missing)8120
 
16.2%
ValueCountFrequency (%)
1256
0.5%
1.54
 
< 0.1%
1.61
 
< 0.1%
1.72
 
< 0.1%
1.82
 
< 0.1%
1.91
 
< 0.1%
2209
0.4%
2.16
 
< 0.1%
2.21
 
< 0.1%
2.322
 
< 0.1%
ValueCountFrequency (%)
520770
41.5%
4.91771
 
3.5%
4.82430
 
4.9%
4.71945
 
3.9%
4.61143
 
2.3%
4.52498
 
5.0%
4.4829
 
1.7%
4.31487
 
3.0%
4.2601
 
1.2%
4.1398
 
0.8%

avg_surge
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct115
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.074765886
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.7 KiB
2022-01-10T23:08:17.464366image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31.05
95-th percentile1.38
Maximum8
Range7
Interquartile range (IQR)0.05

Descriptive statistics

Standard deviation0.2223420846
Coefficient of variation (CV)0.2068748994
Kurtosis77.27725293
Mean1.074765886
Median Absolute Deviation (MAD)0
Skewness6.821169517
Sum53735.07
Variance0.04943600257
MonotonicityNot monotonic
2022-01-10T23:08:17.631417image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
134452
68.9%
1.251100
 
2.2%
1.13956
 
1.9%
1.02809
 
1.6%
1.08798
 
1.6%
1.04774
 
1.5%
1.06770
 
1.5%
1.05704
 
1.4%
1.03619
 
1.2%
1.07616
 
1.2%
Other values (105)8399
 
16.8%
ValueCountFrequency (%)
134452
68.9%
1.01484
 
1.0%
1.02809
 
1.6%
1.03619
 
1.2%
1.04774
 
1.5%
1.05704
 
1.4%
1.06770
 
1.5%
1.07616
 
1.2%
1.08798
 
1.6%
1.09412
 
0.8%
ValueCountFrequency (%)
81
 
< 0.1%
5.751
 
< 0.1%
55
< 0.1%
4.751
 
< 0.1%
4.54
 
< 0.1%
4.255
< 0.1%
412
< 0.1%
3.755
< 0.1%
3.631
 
< 0.1%
3.59
< 0.1%
Distinct182
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size390.7 KiB
Minimum2014-01-01 00:00:00
Maximum2014-07-01 00:00:00
2022-01-10T23:08:17.800079image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:17.953106image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

phone
Categorical

Distinct2
Distinct (%)< 0.1%
Missing395
Missing (%)0.8%
Memory size390.7 KiB
iPhone
34581 
Android
15021 

Length

Max length7
Median length6
Mean length6.302830531
Min length6

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowiPhone
2nd rowAndroid
3rd rowiPhone
4th rowiPhone
5th rowAndroid

Common Values

ValueCountFrequency (%)
iPhone34581
69.2%
Android15021
30.0%
(Missing)395
 
0.8%

Length

2022-01-10T23:08:18.100162image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2022-01-10T23:08:18.172678image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
iphone34581
69.7%
android15021
30.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

surge_pct
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct367
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.849778987
Minimum0
Maximum100
Zeros34407
Zeros (%)68.8%
Negative0
Negative (%)0.0%
Memory size390.7 KiB
2022-01-10T23:08:18.277919image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q38.6
95-th percentile50
Maximum100
Range100
Interquartile range (IQR)8.6

Descriptive statistics

Standard deviation19.95931578
Coefficient of variation (CV)2.255346242
Kurtosis10.43613913
Mean8.849778987
Median Absolute Deviation (MAD)0
Skewness3.144040526
Sum442462.4
Variance398.3742865
MonotonicityNot monotonic
2022-01-10T23:08:18.458369image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
034407
68.8%
1001416
 
2.8%
501367
 
2.7%
33.31152
 
2.3%
25906
 
1.8%
20790
 
1.6%
16.7708
 
1.4%
14.3533
 
1.1%
12.5439
 
0.9%
11.1393
 
0.8%
Other values (357)7886
 
15.8%
ValueCountFrequency (%)
034407
68.8%
0.41
 
< 0.1%
0.53
 
< 0.1%
0.61
 
< 0.1%
0.75
 
< 0.1%
0.85
 
< 0.1%
0.99
 
< 0.1%
110
 
< 0.1%
1.18
 
< 0.1%
1.25
 
< 0.1%
ValueCountFrequency (%)
1001416
2.8%
85.72
 
< 0.1%
83.33
 
< 0.1%
8011
 
< 0.1%
7534
 
0.1%
71.45
 
< 0.1%
66.7168
 
0.3%
64.71
 
< 0.1%
63.61
 
< 0.1%
62.51
 
< 0.1%
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.0 KiB
False
31144 
True
18853 
ValueCountFrequency (%)
False31144
62.3%
True18853
37.7%
2022-01-10T23:08:18.563122image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

weekday_pct
Real number (ℝ≥0)

ZEROS

Distinct666
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean60.92629958
Minimum0
Maximum100
Zeros9202
Zeros (%)18.4%
Negative0
Negative (%)0.0%
Memory size390.7 KiB
2022-01-10T23:08:18.673326image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q133.3
median66.7
Q3100
95-th percentile100
Maximum100
Range100
Interquartile range (IQR)66.7

Descriptive statistics

Standard deviation37.08116998
Coefficient of variation (CV)0.6086233735
Kurtosis-1.154170146
Mean60.92629958
Median Absolute Deviation (MAD)33.3
Skewness-0.4777817161
Sum3046132.2
Variance1375.013167
MonotonicityNot monotonic
2022-01-10T23:08:18.863937image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10016658
33.3%
09202
18.4%
504057
 
8.1%
66.72088
 
4.2%
33.31619
 
3.2%
751104
 
2.2%
60772
 
1.5%
25723
 
1.4%
80668
 
1.3%
40593
 
1.2%
Other values (656)12513
25.0%
ValueCountFrequency (%)
09202
18.4%
41
 
< 0.1%
51
 
< 0.1%
5.91
 
< 0.1%
6.33
 
< 0.1%
6.74
 
< 0.1%
7.14
 
< 0.1%
7.78
 
< 0.1%
81
 
< 0.1%
8.37
 
< 0.1%
ValueCountFrequency (%)
10016658
33.3%
991
 
< 0.1%
98.92
 
< 0.1%
98.51
 
< 0.1%
98.42
 
< 0.1%
98.31
 
< 0.1%
98.22
 
< 0.1%
98.12
 
< 0.1%
983
 
< 0.1%
97.81
 
< 0.1%

avg_dist
Real number (ℝ≥0)

Distinct2906
Distinct (%)5.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.791316679
Minimum0
Maximum79.69
Zeros150
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size390.7 KiB
2022-01-10T23:08:19.064491image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1.2
Q12.42
median3.88
Q36.94
95-th percentile16.77
Maximum79.69
Range79.69
Interquartile range (IQR)4.52

Descriptive statistics

Standard deviation5.63790757
Coefficient of variation (CV)0.9735104956
Kurtosis14.64981878
Mean5.791316679
Median Absolute Deviation (MAD)1.82
Skewness2.966674369
Sum289548.46
Variance31.78600177
MonotonicityNot monotonic
2022-01-10T23:08:19.244907image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0150
 
0.3%
2.3116
 
0.2%
2.29116
 
0.2%
2.36114
 
0.2%
2.7114
 
0.2%
2.73114
 
0.2%
2.5113
 
0.2%
2.65113
 
0.2%
2.83110
 
0.2%
2.54110
 
0.2%
Other values (2896)48827
97.7%
ValueCountFrequency (%)
0150
0.3%
0.0138
 
0.1%
0.0214
 
< 0.1%
0.036
 
< 0.1%
0.0412
 
< 0.1%
0.057
 
< 0.1%
0.061
 
< 0.1%
0.075
 
< 0.1%
0.085
 
< 0.1%
0.092
 
< 0.1%
ValueCountFrequency (%)
79.691
< 0.1%
79.341
< 0.1%
77.131
< 0.1%
73.881
< 0.1%
72.21
< 0.1%
72.081
< 0.1%
71.381
< 0.1%
70.481
< 0.1%
63.151
< 0.1%
62.141
< 0.1%

avg_rating_by_driver
Real number (ℝ≥0)

Distinct27
Distinct (%)0.1%
Missing200
Missing (%)0.4%
Infinite0
Infinite (%)0.0%
Mean4.778153302
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.7 KiB
2022-01-10T23:08:19.410175image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q14.7
median5
Q35
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)0.3

Descriptive statistics

Standard deviation0.4466596459
Coefficient of variation (CV)0.09347955531
Kurtosis24.22735113
Mean4.778153302
Median Absolute Deviation (MAD)0
Skewness-4.128826827
Sum237937.7
Variance0.1995048393
MonotonicityNot monotonic
2022-01-10T23:08:19.538910image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
528507
57.0%
4.84536
 
9.1%
4.73330
 
6.7%
4.93094
 
6.2%
4.52424
 
4.8%
4.62078
 
4.2%
41914
 
3.8%
4.31018
 
2.0%
4.4860
 
1.7%
3602
 
1.2%
Other values (17)1434
 
2.9%
ValueCountFrequency (%)
1181
 
0.4%
1.54
 
< 0.1%
2126
 
0.3%
2.31
 
< 0.1%
2.531
 
0.1%
2.72
 
< 0.1%
2.83
 
< 0.1%
3602
1.2%
3.22
 
< 0.1%
3.347
 
0.1%
ValueCountFrequency (%)
528507
57.0%
4.93094
 
6.2%
4.84536
 
9.1%
4.73330
 
6.7%
4.62078
 
4.2%
4.52424
 
4.8%
4.4860
 
1.7%
4.31018
 
2.0%
4.2342
 
0.7%
4.1125
 
0.3%

retention
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size49.0 KiB
False
38305 
True
11692 
ValueCountFrequency (%)
False38305
76.6%
True11692
 
23.4%
2022-01-10T23:08:19.633970image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Interactions

2022-01-10T23:08:13.829777image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:05.626902image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:06.910125image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:07.976652image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:09.152230image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:10.335806image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:11.459449image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:12.569754image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:13.965460image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:05.874003image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:07.038669image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:08.101667image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:09.309114image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:10.462720image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:11.595326image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:12.699449image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:14.109280image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:06.068201image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:07.175074image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:08.235912image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:09.479364image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:10.598955image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:11.746726image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:12.838135image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:14.248149image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:06.263697image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:07.314776image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:08.377605image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:09.631940image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:10.748503image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:11.893631image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:13.147341image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:14.395010image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:06.393422image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:07.450602image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:08.510551image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:09.795376image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:10.896306image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:12.031128image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:13.282823image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:14.521608image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:06.515763image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:07.577246image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:08.636736image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:09.930614image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:11.031760image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:12.158394image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:13.411320image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:14.656689image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:06.642425image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:07.704690image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:08.767726image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:10.058840image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:11.165646image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:12.286019image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:13.535114image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:14.805161image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:06.776934image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:07.839340image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:09.004952image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:10.194663image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:11.313490image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:12.427136image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2022-01-10T23:08:13.670490image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2022-01-10T23:08:19.716870image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-01-10T23:08:19.900962image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-01-10T23:08:20.075534image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-01-10T23:08:20.236649image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-01-10T23:08:20.368866image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-01-10T23:08:15.041288image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-01-10T23:08:15.354128image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-01-10T23:08:15.607180image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-01-10T23:08:15.724532image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexcitytrips_in_first_30_dayssignup_dateavg_rating_of_driveravg_surgelast_trip_datephonesurge_pctultimate_black_userweekday_pctavg_distavg_rating_by_driverretention
00King's Landing42014-01-254.71.102014-06-17iPhone15.4True46.23.675.0False
11Astapor02014-01-295.01.002014-05-05Android0.0False50.08.265.0False
22Astapor32014-01-064.31.002014-01-07iPhone0.0False100.00.775.0False
33King's Landing92014-01-104.61.142014-06-29iPhone20.0True80.02.364.9True
44Winterfell142014-01-274.41.192014-03-15Android11.8False82.43.134.9False
55Winterfell22014-01-093.51.002014-06-06iPhone0.0True100.010.565.0False
66Astapor12014-01-24NaN1.002014-01-25Android0.0False100.03.954.0False
77Winterfell22014-01-285.01.002014-01-29iPhone0.0False100.02.045.0False
88Winterfell22014-01-214.51.002014-02-01Android0.0False100.04.365.0False
99Winterfell12014-01-03NaN1.002014-01-05Android0.0False0.02.375.0False

Last rows

df_indexcitytrips_in_first_30_dayssignup_dateavg_rating_of_driveravg_surgelast_trip_datephonesurge_pctultimate_black_userweekday_pctavg_distavg_rating_by_driverretention
4998749990Astapor12014-01-134.71.082014-05-18iPhone33.3True33.33.385.0False
4998849991Winterfell02014-01-085.01.252014-06-29iPhone100.0False0.01.065.0True
4998949992King's Landing12014-01-181.01.002014-01-19iPhone0.0False0.07.585.0False
4999049993Astapor32014-01-034.81.112014-07-01iPhone11.1True55.62.534.7True
4999149994Astapor12014-01-034.61.442014-05-31iPhone37.5False25.02.254.5False
4999249995King's Landing02014-01-255.01.002014-06-05iPhone0.0False100.05.634.2False
4999349996Astapor12014-01-24NaN1.002014-01-25iPhone0.0False0.00.004.0False
4999449997Winterfell02014-01-315.01.002014-05-22Android0.0True100.03.865.0False
4999549998Astapor22014-01-143.01.002014-01-15iPhone0.0False100.04.583.5False
4999649999Astapor02014-01-18NaN1.002014-04-20Android0.0False0.03.495.0False